InforLorV4, Main, Exploration, bibRecord, 001686

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Identifieur interne : 001686 ( Main/Exploration ); précédent : 001685; suivant : 001687

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Auteurs : Raphael Fonteneau [Belgique] ; Susan A. Murphy [États-Unis] ; Louis Wehenkel [Belgique] ; Damien Ernst [Belgique]

Source :

Revue d'intelligence artificielle [ 0992-499X ] ; 2013.

RBID : Pascal:13-0216765

Descripteurs français

Pascal (Inist)
- Apprentissage renforcé, Intelligence artificielle, Action, Système actif, Apprentissage supervisé, Commande optimale, Contrôle optimal, Politique optimale, Echantillonnage, Algorithme apprentissage, Identification système, Ajustement modèle, Méthode espace état, Espace état, ..
Wicri :
- topic : Intelligence artificielle.

English descriptors

KwdEn :
- Action, Active system, Artificial intelligence, Learning algorithm, Model matching, Optimal control, Optimal control (mathematics), Optimal policy, Reinforcement learning, Sampling, State space, State space method, Supervised learning, System identification.

Abstract

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000063
to stream PascalFrancis, to step Curation: 000944
to stream PascalFrancis, to step Checkpoint: 000029
to stream Main, to step Merge: 001700
to stream Main, to step Curation: 001686

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author><name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName><settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
<author><name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Université du Michigan</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName><settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
<author><name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName><settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216765</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216765 INIST</idno>
<idno type="RBID">Pascal:13-0216765</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000063</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000944</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000029</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000029</idno>
<idno type="wicri:doubleKey">0992-499X:2013:Fonteneau R:strategies:d:echantillonnage</idno>
<idno type="wicri:Area/Main/Merge">001700</idno>
<idno type="wicri:Area/Main/Curation">001686</idno>
<idno type="wicri:Area/Main/Exploration">001686</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author><name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName><settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
<author><name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Université du Michigan</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName><settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
<author><name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Belgique</country>
<placeName><settlement type="city">Liège</settlement>
<region type="region" nuts="1">Région wallonne</region>
<region type="province" nuts="1">Province de Liège</region>
</placeName>
<orgName type="university">Université de Liège</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Action</term>
<term>Active system</term>
<term>Artificial intelligence</term>
<term>Learning algorithm</term>
<term>Model matching</term>
<term>Optimal control</term>
<term>Optimal control (mathematics)</term>
<term>Optimal policy</term>
<term>Reinforcement learning</term>
<term>Sampling</term>
<term>State space</term>
<term>State space method</term>
<term>Supervised learning</term>
<term>System identification</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Apprentissage renforcé</term>
<term>Intelligence artificielle</term>
<term>Action</term>
<term>Système actif</term>
<term>Apprentissage supervisé</term>
<term>Commande optimale</term>
<term>Contrôle optimal</term>
<term>Politique optimale</term>
<term>Echantillonnage</term>
<term>Algorithme apprentissage</term>
<term>Identification système</term>
<term>Ajustement modèle</term>
<term>Méthode espace état</term>
<term>Espace état</term>
<term>.</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Intelligence artificielle</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</div>
</front>
</TEI>
<affiliations><list><country><li>Belgique</li>
<li>États-Unis</li>
</country>
<region><li>Province de Liège</li>
<li>Région wallonne</li>
</region>
<settlement><li>Liège</li>
</settlement>
<orgName><li>Université de Liège</li>
</orgName>
</list>
<tree><country name="Belgique"><region name="Région wallonne"><name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
</region>
<name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
</country>
<country name="États-Unis"><noRegion><name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001686 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001686 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:13-0216765
   |texte=   Strategies d'échantillonnage pour l'apprentissage par renforcement batch
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022

	Serveur d'exploration sur la recherche en informatique en Lorraine
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la recherche en informatique en Lorraine

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri